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Triplicate readings are of wide occurrence in experimental work. Occasionally, how- 
ever, only the closest pair of a triad is used, and the outlying high or low one discarded as 
evidencing some gross error. The present paper presents a mathematical investigation 
leading to precise determination of some of the biases that result from such selection. 7^his 
project was suggested by certain experiments involving random sampling numbers and analy- 
sis of published chemical determinations. The theoretical findings agree closely with the 
empirical results and imply that selected pairs not only tend to overestimate considerably 
the precision of the experimental procedure, but also result in less accurate determinations^ 



1. Introduction 

Triplicate determinations are fairly common in the 
chemical laboratory inasmuch as a third one is occa- 
sionally taken to indicate wliich of the other two is 
more likely to be off the mark. A corollary of this 
is that if onl}? two of the three measurements are in 
close agreement the worker is under strong tempta- 
tion to discard completely the remaining distant one 
on the ground tlmt evidence of gross error is present. 
A similar practice also appears to be encouraged by 
instruction methods in quantitative chemical analysis 
which grade students not only upon the correctness 
of their results, made in duplicate, but also upon 
their precision as measured by the difference between 
the two residts. Thus, a student might hope to im- 
prove his record by quietly making a third, uncalled- 
for analysis, give himself the advantage of the closest 
two of all three, and omit to mention the remaining 
one. This is a very striking case of the long-standing 
problem of the rejection of outlying observations and 
raises the statistical question of how estimates of the 
mean and variability of analyses are affected by such 
procedures. It is this question, rather than the re- 
jection of outlying observations,^ ^ that is emphasized 
in the present investigation, although the rejection 
problem is also touched on, in connection with the 
first of the three statistics, yi, discussed below. 

The author is indebted to W. J. Youden for draw- 
ing his attention to this question and suggesting its 
theoretical investigation when search of the statistical 
literature indicated that this apparently simple 
problem had not been considered heretofore.^ 

Accordingly, the present study was executed and 
resulted in the present paper, which is purely a 
mathematical treatment undertaken to verify and 
extend certain sampling results, obtained by Youden 
in an empirical investigation of the above question, 
w^hich were reported in the National Bureau of 

J Fip;uros in brackets indicate the literature references at the end of this paper. 

2 For information on the many asoocts of outlying observations that have been 
treated in the literature, the reader is advised to consult a recent article by F. E. 
Grubbs [1], in which, in addition to disoassinjr several new criteria for testing dis- 
cordant observations, he presents a detailed bibliography of the problem. A 
particularly comprehensive survey of developments prior to 1933 is provided in a 
study by P. R. Rider [2] published in that year. See also the two papers by W. J. 
Dixon [7, 8] and the one by G. R. Seth [10]. 



Standards Technical News Bulletin for July, 1949 
[3]. The method of treatment was to study some 
of the properties of two measurements that are 
selected out of a sample of three according to a stated 
criterion computed from the sample observations. 
The statistics that define such properties are of more 
general character than order statistics — that is, ob- 
servations ordered according to size, such as the 
largest value in a sample, the sample median, etc. 
Whereas order statistics are widely treated in the 
literatiu'e,^ the type of statistics being considered 
here, wliich depend on features other than size, have 
apparently received relatively little attention.^ 

This report is thus limited to the following three 
questions, answers to which will serve to throw light 
on the differences to b(* expected between taking two 
measurements at random (''true diipHcates") and 
taking two measurements tliat are really part of a 
random sample of three.^' (1) In a random sample 
of three observations from a single (continuous) 
population what values of the foUowhig ratio may 
be considered significant: ratio of the gap between 
the two closest values to the whole range of the 
sample? (2) How does the range in a sample of 
true duplicate measurements compare with the differ- 



3 After this paper was prepared, the author received a copy of a manuscript of 
an article by Franklin M. Henry of the University of California, Berkeley, en- 
titled, "The loss of precision from discardins: discrepant data". Thi"? article has 
since been published [11]. It presents no mathematical theory for triads, but 
gives, among other interesting points, a discussion of an experiment in judging 
10-second time intervals by a series of triplicate measurements in which the two 
"closest" were averaged in each triad. The standard deviation of the mean of 
such averages for 50 triads was 0.131 sec, whereas theory (table lb, Part B, col. 1) 
gives (since the standard deviation of the whole set of 150 readings is o-= 0.162 
instead of the <r = l used in our tabled the remarkably close value 0.7986X0.162= 
0.129 sec for samples from a normal population and 0.908.3X0.162=0. /47 sec for 
samples from a rectangular population (col. 4). The aiithor is obliged to Henry 
for his kindness in making his paper available in advance of publication. 

Attention is also called to a note by G. R. Seth [10] on the distribution of the 
two closest among a set of three observations. Seth became interested in the 
problem in the course of a discussion with the author during his visit to the 
Statistical Engineering Laboratory in the spring of 1948. In this note he obtains 
in general terms some of the results also given in the present paper and applies 
them to the normal distribution. The author wishes to acknowledge that the 
present paper has benefited from correspondence with Seth on the problem. (In 
this connection see also footnotes 11 and 16.) 

* For a comprehensive survey of the literature on order statistics, see Wilks [4]. 

5 The most directly relevant article known to the author is by J. W. Tukey [5], 
in which he obtains tables relative to the distribution of the largest gap, rather 
than the smallest, in samples of from 2 to 10 by experimental sampling from a unit 
normal universe and also by analytical means. 

8 The answers to these questions are indicaied in the tables as follows: (1), 
tables 2 and la; (2) and (3), table lb. These tables, which are an attempt to 
condense the main results of this paper, are summarized in section 2. 
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ence between the two values in each of (a) the 
closest pair out of a sample of three measurements, 
(b) the lowest (or highest) pair out of such a sample, 
and (c) the pair of extremes (highest and lowest 
values) of the entire sample of three — as regards 
several types of universes? (3) How do the means 
compare in each case? It is not intended to con- 
sider other problems that can arise, such as drawing 
the sample from a mixed population, or adopting a 
rule to omit the extreme measurement only when 
the range of the three observations exceeds a specified 
value, and utilizing all three of them otherwise. 
Neither is it intended in this paper to go into any 
other statistical questions such as estimation and 
significance tests or more general decision problems. 

2. Summary 

The answers to the above questions involve pri- 
marily the investigation of the distributions of the 
three statistics, ?/i, y2, and y^, whose main properties 
are summarized in tables la and lb below and 
compared with the results of both actual sampling 
by making use of a table of random numbers, and 
data on chemical analyses that appeared in the 
chemical literature.'' 

The statistics yt are defined as follows. Let Xi, 
X2, Xz be the sample of three observations arranged 
in order of increasing magnitude: 



Let now 



X] \ X2 ^ ^3« 






designate the same three observations rearranged so 
that x^ and x^^ are the two closest of the three and 
;r'>.T'^ Then the selected statistics treated are 



yr- 



x —x 

^3 — ^1 



y2-- 



Vz-- 



x'-\-x' 



Results are presented, insofar as they have been 
obtained, for the three parent universes, rectangular. 



7 For additional comparisons with experimental data that came to the author's 
attention too late for inclusion in the main body of the paper, see footnote 3. 



right triangular, and normal, though not necessarily 
in the same detail for each one. 

The comparisons indicated in table 1 reveal the 
following facts for random samples of three measure- 
ments, where, unless otherwise stated, the statements 
apply to samples from a normal or a rectangular 
population: 

1. The empirical sampling results, obtained prior 
to the theoretical calculations, show fairly substantial 
agreement with the theor}^. The chemical data from 
experimental determinations reported in a chemical 
journal and studied by W. J. Youden are likewise 
in agreement.^ 

2. The statistic i/i, which characterizes the parti- 
tion of the range by the middle item in a random 
sample of three measurements, behaves remarkably 
alike for samples from thi^ee different basic popula- 
tions, the normal, rectangular, and right triangular 
(table la). This suggests that this ratio statistic 
will not be very useful as a criterion for discriminat- 
ing between a normal population and some other 
population. 

3. A set of two observations selected by taking 
the closest two out of three from a normal or a 
rectangular population differs strikingly from other 
pairs taken from the three or from a pair of true 
duplicates, as shown by the following: 

a. The average dvfference (as measured by 2/2) 
between the selected pair is less than half that for 
the true duplicates, and the same is true of the 
variability of this distance as measured by the 
standard deviation (table lb. Part A, Cols. 1, 3 and 
4, 6). Furthermore, the difference between the 
selected pair behaves (again in an average sense) 
very much like halj the difference between the two 
lowest (or highest) in the full sample of three, and 
(in the same sense) is similar to one-quarter the 
difference between the two most extreme measure- 
ments in the sample. The standard deviation of 
the difference between the closest pair is, however, 
comparable to the standard deviation of half the 
range (table lb, Part A, Cols. 1, 2, and 4, 5). 

b. The mean (y^) of a selected pair varies some- 
what more than the mean of a true duplicate pair, 
the average value of both these means being the 

8 For other empirical evidence see footnote 3. 



Table la. Characteristics of the ratio y\ of the distance between the closest pair to the range in a sample of three measurements 






N, number of samples of 3._. 

Probability density function 

Expected or mean value 

Standard deviation 



Normal population 



Theory 



(1) 



3V3 



^(y\-yi-\-i) 

0. 2621 
0. 1428 



0<2/i<l/2 



Sampling 

with 
random 
numbers 

(2) 



0. 2582 
0.1421 



Rectangular population 



Theory 
(3) 



2,0<i/i<l/2 

0.25 
0. 1443 



Sampling 

with 
random 
numbers 

(4) 



200 



0. 250^) 
0. 1612 



Right triangular population 



Theory 
(5) 



2, 0<?/i<l/2 

0.25 
0. 1443 



Sampling 

with 
random 
numbers 

(6) 



200 



0. 2441 
0. 1480 



Pub- 
lished 
chemical 
data 



(7) 



0. 2573 
0.1565 
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same. Especially noteworthy is the fact that the 
true average of the population is more accurately 
estimated Dy using the two most discrepant observa- 
tions of the three in forming an average, ji (iPi+^s), 
than by taking the two that are most in agreement, 
although neither method is as accurate as taking 



the mean of all three (table lb. Part B, cols. 1, 4). 
Thus, selection of measurements on the basis of close 
agreement increases rather than decreases the true 
error of measurement. 

In addition to the above relationships the be- 
havior of the outl^ang observation .x"' is of interest 



Table lb. Characteristics of other statistics related to the closest pair of measurements in a sayriple of 3 

X\<X2<. . . <Xn denote the measurements in a sample of n ordered accordinpr to size. If 7i=3, then x' and x", x'>x", denote the two closest measurements in the 
sample (xi, X2, xz). The measurements are drawn independently at random from the populations designated. The rectangular population has been adjusted to unit 
variance and centered at the origin. Exact values and distribution functions are given where practical. Where the interval of nonzero probability density is omitted 
for a probability distribution, the variate is assumed to take all values from — co to +». For fuller explanation see text. 





Normal population, 




Rectangular population with unit variance, 




^|2lr 




Vl2 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


Statistic 


Closest pair in a sample of 3 


Lowest pair « in a 
sample of 3 


Sample of 2 ("true 
duplicates") 


Closest pair in a 
sample of 3 


Lowest pair » in a 
sample of 3 


Sample of 2 ("true 
duplicates") 


A. Statistics relative to the DISTANCE between two values 


x'-x"=2y2=y!, 


{X2-X{)I2 = S 


{X2-Xl)l2=p 


x'-x"=2y2=v^ 


(2r2-Ji)/2=s 


(X2-Xi)l2 = p 


Probability density 
function. 


0<y2'<« 


IT Jl» 

0<S<oo 


0<p<a> 


o<y^< V^ 


V3 

0<«<V"3 


0<p<^f^ 


Mean . - 


i^'(2/a) =0.4535 

£;(y^) =0.4451 (experimental 
value b) 


E(s) ==-^^=0.4231 


J5:(p) = -^= 0.5642 


Eiy.i) =X1 =0.4330 


£:(s)=^=0.4330 


E(p)= ^|^^0.577i 


other means for 


comparison. 


Eixz-Xx)lA=^=QA2Zi 
I 4V'r 


£(:r3-Zi)/4=-4- 
4Vir 




E(X2-Xl)li=^ 

4 


E{xz-Xx)IA=:^ 




Standard deviation. 


<rM =0.3746 


a («) = 5.3379 


<r(p)=(l-l)^=0A2m 


«^M =^=0.3354 


oJ-r 
<r(«) =-^ = 0.3354 


a(p)=V^=0.4082 


other standard 
deviations for 
comparison. 

Statistic 


a(2:3-Ji)/2 = 0.4442 


a (x3-a:i)/2= 0.4442 




aiX3-Xi)l2 = 

^^=0.3873 


a{Xi-Xi)l2 = 

f=0.3S.3 




B. Statistics relative to the AVERAGE of two values 


(:c'+x")/2=I/3 


{xx-{-X2)l2=q 


{Xi-\-X2) 12 = 171 


(x'-^x")/2=y3 


(a:i+J2)/2=g 


(:ri+X2)/2=m 
|-(V3"-|w|), 


Probability density 
function. 


where t ranges over (-co, 
—3^2+2/3) and {^vi+yz, 0°); 


(<=) 


■Sir 


C) 


Mean _ 


E{yi)^Q 

£:(jy3) = -0.0335 (experi- 
mental value b) 

JE:Cri+J3)/2=0 


E{q) = ^ = -0.4231 


E(m)=0 


E(yz)=0 
E(xi+Xi)/2=0 


Eig) = 
-^=-0.4330 


E(m)=0 


other means for 
comparison. 




, E{x\+X2+xi)l'i=Ex=0 






£:(a:i+X2+a:3)/3=0 






Standard deviation. 


/ 1 VT\^ 
-(?/3) = (^y+-^j =0.7986 

/ (t(j/3) =0.8098 (experimental 
value b) 

a(a:,+a:3)/2 = 


a ((7) =0.6244 


a(w)= V^=0.7071 


cixi-{-Xi)/2= 


<r(g)='Y/|^=0.6423 
a(xi-\-X2)/S= 


<r(m) =0.5 


other standard 
deviations for 


(l-^)^=o.ao. 


<^(2:i+a:3)/2=0.6018 




V^=0.5477 


V^=0.5477 




comparison. 


a(X\-\-X2^Xz)jZ = <ri=~-z 

V3 


















<r; = — =0.5774 

V3 








I =0.5774 













a The characteristics of the highest pair are obtainable from symmetry considerations, 
b Values obtained by sampling experiments using a table of random normal deviates. 
• These density functions have been omitted since they are rather complicated. 
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and will be briefly considered. 

Although the basic ideas present little difficulty, 
the explicit values and probability distributions 
needed in this paper often involve calculation of 
multiple integrals over quite complicated regions. 
The exact calculation of these integrals has usually 
required much tedious manipulation, too lengthy 
to warrant more than the briefest indication. A 
detailed manuscript of these procedures is in the 
possession of the author. 

3. Derivation of Results; Descriptive 
Properties 

3.1. The Statistic yi 
a. Distribution and moments in general 

liCt Xiy Xo, Xz be the three observations, arranged 
in order of magnitude, m a random sample of three 
from a population with fdf (probability density 
function) /(x), supposed continuous (and dif- 
ferentiable as often as necessary), and suppose 
/ {x) is nonzero m the interval (a, b) where either 
or both endpoints may be at infinity. Then the 
joint density function of Xi^ X2, x^ is [6] 

p(xi,X2Xs)dxidx2dxz=^\J(xi)f{x2)j'(xs)dxidx2dxs, 
a<Xi<X2<Xs<b. (1) 

Letting x^>x^^ be the two closest observations, the 
statistic yi may be written 



y^"="^ — ;r=^i^ (^11,^12), 

Xs Xi 

where 

yn{Xi,Xo,Xs)^- ~j yi2[Xi,X2,Xz) = 



Xs Xi 



x^ — Xi 

^l—yn{xi,X2,Xs) 



(la) 



are simply functions of the x's and will be used with 
the arguments often omitted for brevity. Thus it is 
required to find the distribution of the variate 2/1 ? 
defined over 0<?/i<K, which takes different func- 
tional forms, namely 



2/1=1 



yn{xi,X2,Xs) if 0<yn(xi,X2,Xs)< 
yi2(xi,X2,xs) if 0<yi2{xuX2,X3)< 



where t/h, yu are simply used as abbreviations for the 
fractions in (la). 

To find the distribution, of 7/1, we have (in the 
notation of the theory of probabiUty), since the 
events indicated on the right are mutually exclusive, 



P{2/.<F}=p{o<2/n<r,0<yu<^} 
+ P|0<2/u<r, 0<2/.8<i|, 



(2) 



which is equivalent to 

0, if r<o 



P{y^<Y}==^ 



P[0<yn<Y} + 

P{0<yu<Y}, if 0<r<i 



if r> 



(2a) 



The equation (2a) can be differentiated with respect 
to Y to give the probability density function in the 
form 

p(yi)=Pi(yn)+P2{yi2), a o<yn, yi2<^-=o, (2b) 

=0, otherwise, 

with 2/11 and y^ replaced by 2/1 in the result. Thus 
the required distribution is reduced to those of sta- 
tistics of the usual type. 

To find pi {yu), apply the transformation ^ 

Xi = r — ii 

X2 = r — g[(l-yn) (3) 

Xs=r 

to (1), obtaining 

h(yu,(l,r)dyndr dq = QqJ(:r — q)f[r - q(l - yn)W)dyudr dq, 
0<q<r-a, a<r<b, 0<yn<l (4) 

whence the j)df of the variate 2/11 is 

nr-a 
6q j(r-q)j{r)j[r-q{l-yn)]dq dr 

==Pi{l-yi2), 0<yn<l. (5) 

Since yi2=l—yiu its density function is, similarily, 



^2(2/12)= 



'b Cr-a 
0<2/,2<l 



6g J{r-q)f(r)J{r-qyu)dq dr, 



(6) 



Hence finally (2b) gives 



v{yi)-=' 



b f^r—a 



^q j{r-q)f{r)4>{yi)dqdr, 



^<yi<\> 



(7) 



8 This is obtained by putting yn*=» » q=Xi—x\, and r=xi. 

'Xi—x\ 
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where 



4>{yi)=fir-qyi)+f[r-aii-yi)], 



(8) 



as the general formula for the distribution of yi for 
a population with continuous pdjj{x). 

The above expressions appear to lend themselves 
to but few general statements. Thus, it may be 
seen from (7) and (8) that for a rectangular parent 
population j{x) the distribution jpiyi) of ?/i is rec- 
tangular. Furthermore, yi will evidently be rec- 
tangularly distributed for all parent distributions for 
which the function <^(2/i) does not depend upon ^i, 
that is, for which the function 

<t>{yi)=-f(r-(iy,)+j[r-q{l-y,)] 

depends at most upon r and q. If / is a linear 
function (triangular or rectangular distribution), 
this is seen to be true, for the ?/i's cancel out. Con- 
versely, by differentiating with respect to t/i, it can 
be shown that if y^ has a rectangular distribution, 
then / must be linear, if differentiable. 

For future use, it is desii-able to obtain general 
expressions for the moments, jUfc, of ?/i. In view of 
(la), (2b), and (6) these are given by 



Jo Jo 

I yi2Pi(i—yi2)dyi2 

which, under the transformations 



+ 



2/11 = 



4-'. 



2/12 = 



r-^. 



become 

in which pi is the pdj of the ratio 

X2 — Xi 



yn- 



X^ — Xi 



If the function piiu) is one that is symmetrical about 
u=-^j then /Zfc niay be written, putting -—t=s, 



ixk=2 I %^pi(s)ds. 



For certain symmetrical uaiverses, the distribution 
of 7/11 has been investigated numerically by W. J. 



Dixon [7] for samples of three and various larger 
sizes as well.^^ 

b. Rectangular universe 

For the rectangular or uniform parent universe 
given by 

J(x)=l, 0<:c<l 

and zero elsewhere (this simple form is called the 
^^square'' universe), the general expression (7) 
becomes 

p(y,)=2J^jyqdqdr = 2, 0<y,<^. 

verifying that the ratio 2/1 also is rectangular. 
The first few moments are 

It is interesting to see whether values of the ratio 
yi tend to depend on the spread, x^~Xi, of the sample 
values. 

It can be shown by the method used in obtaining 
(4) that, for the rectangular case, the joint probability 
density function of ?/i, Ji, Xs is 



/(2/i,Xi,X3)= 12(x3-Xi), <a:i <X3 < 1 , 0<yi< 



1 



Since this is independent of ?/i, it follows that the 
ratio 2/1 is independent, not only of the range, but 
also of both sample extremes Xi and x^. 

c. Triangular universe 

In simplest form this is given by 
J(x) = 2x, 0<a;<l 
and zero elsewhere. Formula (7) here gives 

P(yi)=\ J '^HKr-q_)[r + (r-q)]dqdr = 2,0<yi<-y 

so that the distribution of 2/1 is identical to that of 
the previous case. 

10 In addition, Dixon has published a paper [8] that gives a thorough treatment 
of a large number of measures that may be used in testing whether Jan outlying 
observation (or several such) should be rejected. Of these statistics, the only 
one that has any direct relationship to any studied in the present paper is, 
for n=3, 



J-io- 



_Xn-Xn-l X2-X2 



Xn —Xi X^—Xx 



This expression, which (for n=3) is the same as yni^'l—yw) in (la) above, is 
mentioned by Dixon as a criterion for testing the upper outlier xz. The author 
is obliged to Dixon for making his two papers available in advance of 
publication. 



988531—52- 
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Figure 1. Distribution of ratio y\ in samples from a normal 
and from a rectangular universe. 



f(yi)dyi 



Vv 



3VT dyi 



TT yl-y\+i 



x'—x" 



0<!/,% 



Xi—X\ 

X'>X"y 



For /(x)=- 



X-i>X2>X\ 

d. Normal Universe 



V27r 



oo < X < 00 J (called the 



unit or standard normal distribution or universe), 
formula (7) becomes 

{r-q l-yiYnld^dr, 0<^i<- 



or 



p(yi)= 



3V3 



Tr(l—yi+yi^) 



0<yi< 



2' 
1 '' 



(9) 



This"^ consists of the arc of a Cauchy distribution 
curve included between the left-hand inflection 
point and mode, and is shown in figure 1. Several 
percentage points obtained from the cumulative 
distribution are presented in table 2. 

Table 2. Percentage points of y\ for the unit normal 



Vi= ; Pr {yi<y°} = -- arctan i 

3:3—^1 IT 



Probability, P, 


Critical value, y?. 


that yi does not 


corresponding to 


exceed given val- 


given probabil- 


ue of y? 


ity P 


yt 


p 


P 


y? 














1/n 


0. 1572 


0.01 


0. 00603 


1/6 


0. 2983 


0.05 


0. 02979 


1/3 


0.6369 


0.10 


0. 05874 






0.25 


0. 14128 






0.50 


0. 23205 



11 This distribution has also been obtained by G. R. Seth [10]. (See also foot- 
notes 3 and 16). 



The above table bears out the fact that the ratio 
yi is not a good criterion to use for the rejection of 
outlying observations. Thus, a ratio as marked as 
one-sixth or less, indicating that the outermost ob- 
serv^ation is at least five times as distant from the 
middle one as is the remaining one, may be expected 
(if the universe is normal) about 30 percent of the 
time; even when the distances are in the ratio 10 to 1 
or more, one by no means has a rare event — it may 
be expected only a little less often than in one 
sample out of six. 

The moments are given by 



V{y,)= 



2 
3^/3 



-%^ ln|=0.26209, 



3 27 



An|^ -0.020392, 



^(2/1)= V^(^i) = 0. 14280. 

The correlation between the range and 2/1 is found as 
follows: 

p{x,-x„ 2/0= <.(x3-io-<r(2/i) ■ 

_ E{x'-x")-E{x^-x,)-E{y,) _ 
a{xi—xi)-<j{yi) 

pKx^-x,,y{)- (o.88837)(0.14280) -0-0781, 
on making use of the fact that 



E{x'-x")=E{2y,) = ^ '^J^ -- 



:0. 45352, (10) 



from a result obtained on page 263 and 

E'(a:3-a:i) = 2£;(x3)=4"=l. 69257 

-yTT 

a\x^-x,) = 2a\x,)-2a{x,x,) = 2(^- ^~^j'^ ^ 
= 0.78920, 
from the exact values given by Jones [9]. 

3.2. The Statistic y^=)(,{x' -x'') 

a. General Formula for Its Distribution 

The development of section 3.1, a. can be used 
but will not be given here. It will be more fruitful, 
however, to pursue an alternative method adapted 
to the form of y^ and yz. This will readily yield the 
joint distribution of y^^ y% and thus simplify their 
study. 
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We first obtain the joint distribution of x\ x" , 
x'" ^ where it will be recalled that x' and x" are the 
two closest observations, x^>x^^, and x^^' is the 
remaining one, the outlying value, either above or 
below the closest pair. Writing 



x =^u, 



-Vy X''' = W, 



we have the transformation T given by 



when 0:2 — a:i<a:3— 0^2, 
i.e. n—2v+w>0 (i?i) 



when 0:2 — 2^i>a:3— X2, 

i.e. v-2u + w<Q {R2)) 



YT 



X2=V 
Xz = W J 

and 

Xi = W 

X2 = U 

Xs=v 

We know the joint distribution of Xi, X2, x-^, namely 

V{^u^2,^z)='^\j{xi)f{x2)j{xz),a<xi<x2<x^<h, (11) 

and desire that of u, v, w resulting from the transfor- 
mation T. Since the regions of definition become 
increasingly complex, we shall sacrifice some slight 
generality by taking a=0, 6 = 1, and reworking the 
results whenever necessary. This will not be diffi- 
cult once the general line of procedure has been 
indicated. 

Since the function in (11) is symmetric, the density 
function for u, v, w remains of the same form. The 
only difficulty is determining the region over which 
it is different from zero. By somewhat tedious 
manipulations, this region may be shown to consist 
of the portions: ^^ 

2v—u<w<[, u<v<^(u+l), 0<u<l {R[) 



0<w<2u- 



:v<u<v, 0<^;<1, (R2) 



>R' 



so that the pdf for {u, i\ w) is 

g{u, v,w)=-^j{u)j{v)j(w) in R' 
= elsewhere. 



(12) 



The joint distribution of u{==^x'^) and v{=x') 
may then be obtained by integration: 



12 Note that the variables u, v appear in reverse order in {R\) compared with 
{Ri'). If the order is kept the same, it will be found that {R2') will need to be 
further brokon into 2 parts, {R2\) and {R22'). The present order will therefore 
be retained in the interest of simplicity. This need occasion no difficulty if 
care ts used when integrating. 



Mu,v)= 



&Au)J{v) r Aw)dw,u<v<Uu+i),0<u<l 

J2v-U ^ 

(13) 



^mm'^'^ 



2U—V 2 

J(w)dWy-v<u<Vy 0<?; <1 



It should be remembered that this formula holds 
only if the initial distribution /(a:) is non-zero in the 
range to 1. For more general ranges a to b, the 
results would be rather complicated. 

The joint distribution of ^2 and 2/3 may be obtained 
from that of u, v in (13) by the transformation U 



U: 



y2=^(x'-x'')=-(v-u) 



1 



with Jacobian ——) and inverse, 

Li 

TJ-^\ u=—y2+y^, v = y2+yz. 

Substitution into (13) presents no problem. The two 
partial regions in (13) are transformed as follows: 



(14) 



first sub-region into -< 



0<7/2< 



1—2/3 



1 



<2/3<l 



0<?/2<2/3, 



second sub-region into -< 



0<2/2<3 2/3. 



0<2/2< 1-2/3, 



0<2/3<4' 
(15) 
0<2/3<| 



i<^3<i- 



Discussion of moments and other properties is most 
easily carried out in connection with the specific 
populations discussed below. 

To find the distribution oix''' {=w) the region R' 
must first be expressed by changing the order of the 
variables u, v, w so that the condition involving w is 
written last, permitting u and v to be integrated out. 
The procedure is the same as determining new limits 
when transforming variables or changing the order 
of integration. 

The result of transforming the region and inte- 
grating out u and v is 



LJO Jo JhwJ2v-wA 

+ \gdvdu, 

0<w<l, 
where g=g(UyV,w) is given by (12). 



(16) 
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12 r 




P(y2)= I2(l-4y2)' 



1/4 1 

Figure 2. Frequency Junction for 2/2. 
b. Rectangular Universe 

For a rectangular (square) universe 
/(x)=l, 0<x<l, 
(13) becomes, with the aid of (14), (15), 



P(y2,y3)=^ 



12(1-3^/2-^3), ^2 < 2/3 < 1-3^/2, 0<7/2<- 



12(-3y2+2/3), 3^2<y3<l-2/2, 0<:?/2<^ 



(17) 



so that the pdf of 2/2 is 

(1-31/2-2/3)^1/3 + 
^1-2^2 1 

12 (2/3-22/2)^^3, if 0<7/2 <^; 

J 32/2 ^ 

= 12(1-42/,)^ if 0<y<|. 

Its graph is sketched in fig. 2. It is seen that 
small values of the difference (x^—x^^) appear to be 
overwhelmingly frequent in samples of three from a 
rectangular population, thus giving a possible intuitive 
explanation of the fact that the dispersion is much 
less than in the case of true duplicates. 
Moments of this distribution are 

<^(2/2)=j^y|=0.04841. 
It should be recalled that these moments apply only 



to sampling from the rectangular (square) popula- 
tion, in the form/(x) = l, O^x^l; and elsewhere. 
For the case of a symmetrical rectangular population 
with unit standard deviation, see sec. 3.2, d, (4). 



c. Normal Universe 



Since the limits are no longer to 1, the distribu- 
tion of 2/2 has to be worked out anew. 

For the sample of size three, the two functional 
forms of 2/2= 2 (^'"^'0 ^re 



y2=< 



X2 — Xi 
^3 ^2 



when X2—X1 <x^—X2 
when X2 — X1 >X3— r2. 



(18) 



This becomes, putting x2 — Xi=Siy Xs—X2=S2f 
1 



y2=< 



Si, when Si<S2 
S2, when Si>S2, 



The desired distribution will then be obtained from 
the joint df oi Si and ^2 by integrating out the above 
conditions (18) separately and replacing the ^^free" 
Si by 22/2. 

The joint ctf of Si and §2 is found by the usual 
method of transformation from the joint df of the 
basic ordered variables Xi, X2, and x^ as follows: 
The transformation 

X2 — Xi=^ Si 

X^ X2^^S2 

Xi-\- X2~tX^=S^ 



carries the joint df 

J\Xiy X2) X^,) (lXiCiX2(^X 

into 



3! 



-i(.l^4+4) 



gisi,S2,Sz)dsids2ds^^ 



' {2Trf" 

1 -u 



dxidx2dx2. 






e - ' ^ - -'dsids2dszfO<Si<i<^, 

< S2<C 00 , — 00 <53< 00 

Integrating out S3 from— 00 to 00 gives 

h(Su S2)dSidS2 = ^e~"^''^''''^''^ ds,dS2, <Si< a. , 
0<S2<oo. 

We can then obtain the distribution of 2/2 as 
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f(y2)dy2-=\ h{si,S2)ds2'dsi\ 

LJ*2>^1 J Sl = 

Dh{si,S2)dsi'ds2 



+ 



S2 = 2|/2 



_A^> I V2 22 ^^ds2'2y2dy2 + 

^ (J21/2 



/» 00 

J2J/2 



6 ^ '^ ^^dsi'2y2dy2 



4^/3 



2 /»o 

f 



1. , X . 1. 



on using the transformations ^=^(^2 + ^2), ^=^(^1 + 2/2) 

in the first and second terms respectively, and 
combining. This change of variable in the infinite 
integral is legitimate, for both old and new integrals 
converge, and the transformation s = 3t—y2 from s to 
t has a continuous derivative (unity) which does not 
vanish in the range of integration. 

The first two moments of 2/2 involve integrals of 
the following t3^pes: 



JO J ky 



Jo J ky 



ye ^ dxdy- 



2a 



y^e ^ dxdy= -^ arctan 



\2ak') J-n 



in which 



Q=ax^+hxy'\-cy'^, A = 4ac — 62>0, 

¥ = k-{--^r-) q==ak^+bk+c, 
la 



a, c>0, 



These values give 



6-3V3 



= 0.22676 



2^fJ^ 

^W=-^-^^^^=0.03508 
(r(?/2) = 0. 18730. 

d. Comparison of y2 with other measures of two observations 

(1) 'True duplicates" (sample size n=2). 

For samples of 2, the closest pair (x", x^) is simply 
the entire sample: 



and 



y2-- 



X — X2) X — Xi 
X X X2 X\ i 



= 2^2, 



where R^ will be used to denote the range of a sample 
of n. In table lb, y2 (for samples of 2) is denoted 
by p (Part A, cols. 3, 6). 

Since a main objective is to make comparisons for 
samples from rectangular and from normal popu- 
lations, it is first necessary to put them on a com- 
parable basis. ^^ The normal population studied is 
symmetrical and has standard deviation unity. 
The rectangular population with these same char- 
acteristics of location and scale is 



y 1 2 
= otherwise, 

since the standard deviation of the rectangular 
(square) population previously considered is Vl2. 
The quantities needed in the comparisons below 
involving the rectangular distribution will be most 
conveniently obtained by computing them for the 
simple case of a square distribution and then multi- 
plying by the scale magnifying factor ^Jl2, Evi- 
dently the statistic 7/2 will not be affected by the shift 
in location of the population. 

The results are (here 2?/2 is simply the range, 
X2 Xi): ^ ^ 

Rectangular universe g{x)=l/^12,~^J3 <x<^j3; 
and elsewhere. 

E(y.)=E (^)=£: (I R.)=ll- .12 = 0.5774 

<^(y2)=<r (?iZi2^=^^.2V3 = 0.4082. 

(From the distribution of the range p{Rn) = 
n{n—l)Rl~'^{l—Rn) for fi=2, combined with a 
transformation which multi])lies the scale of the 
variable byV12.) 

Normal universe f(x)={l/-^2T)e-x^/2^ _oo<;3:<oo 

£(2/2)=-^(^^)=l/V^=0.5642 

<.(..) = . (^)-(i-iy =0.4263 

(From Jones [9].) 

(2) Lowest (or highest) ^^ pair out of three (71=8). 

For samples of three, Xi<X2<Xz, we have the fol- 
lowing results for s = {x2—Xi)/2: 

Rectangular universe g{x)= 1/ Vl 2, —^<x< V3; 
and elsewhere 



13 This consideration did not arise when studying the statistic j/i, because, being^ 
a ratio of lengths, it is unaffected by changes in scale of the parent population. 

1* Since the parent distribution is in each case symmetrical, the results for 
the lowest and highest pair are identical. 
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£('^^Vi-Vl2 = 0.4330 
/X2—xA /.; 



/3X12 
320" 



0.3354 



(From Wilks [6].) 
Normal universe 



2 / \2 

(From Jones [9].) 

(3) Half -range, 



/(a:,)=(l/V27r)6-^'/2,_oo<a:<oo 

^V-^=0-4231 

^\ /l___9 + 6V3Y 
/ \2 IGtt / 



= 0.3379 



{xz—xi){n=^). 



The analogous quantities which describe the spread 
in the set of 3 are: 

Rectangular universe 
and elsewhere. 



f7(x) = l/Vl2,-V3<x<V3; 



a (^^^)=V(l/80).12-0.3873 



(From the distribution of the range jp{Rn) for n=?>) 
The reason for using one-fourth rather than one- 
half the range runs somewhat as follows. The 
distance (^2 — Xi) between two adjacent values in a 
sample of 3 can take values from zero all the way 
up to the range of all three. Thus, in a rough 
average sense, this distance represents some fraction 
of the range, and it happens that in the cases we 
have considered, this fraction is remarkably closely 
given by one-half, so that half this distance, namely 
c<f=K(^2 — X\), is, in the same sense, given by one- 
Jourth the range. 



Normal universe f{x) = (lf^/2^^)e-^^/^, 

( x^ — '^i \_ 3 



><a;<c 



E 



A-yfir 



-0.4231 



= 0.4442 



/X^ — Xi\ 

"{-2-)-' 
(From Jones [9].) 

(4) Closest pair in samples of three. 

For comparison, moments of 2y2=x^—x^^=y2 for 
samples of three are presented here based on the 
moments of 7/2 found above (sees. 3.2, b, and 3.2, 
c.) and also adjusted, in the case of the rectan- 



gular universe, for moving the mean of the distribu- 
tion to the origin and increasing the scale by the 
factor Vl2^ 



(;(x)=l/Vl2, -^^3<x< 



Rectangular universe 
-yJS; and elsewhere. 



E(y',)^2E(y2)=2~ Vl2 = 0.4330 

TeV" 



(T{y2)=2(T{y2)=2' 



12 = 0.3354 



Normal universe 
— 00 <^x<C ^ . 



/(:r) = (l/V2x)6-^/2, 



£:(?/;)-: 2£'(y2)-=2(0.226761)=0. 4535 

^(i/;)=2(7-(t/2) = 2(0.18730) = 0.3746 

The reason for using twice 2/2, rather than t/o, for 
comparison with the previous values is analogous to 
that given in section 3.2, d, (3) for using one- 
fourth rather than one-half the range. The restric- 
tion to the closest pair means that the distance 
x' —x" cannot vary to the same extent as X2— a:*!, for 
its size is limited at most to kalj the range, while 
2:2— Xi can take values up to the range of the sample. 
Thus it is to be expected that x' —x^' ^ that is, 21/2, 

is the quantity comparable to ■ ^ -^ which in turn, 

by the argument in section 3, 3.2, d., (3), is compar- 



able to 



Xz — Xi 



It turns out that these relationships 



are exactly true in the case of the parent (adjusted) 
rectangular distribution, and remarkably close in 
the case of the parent unit normal. 

3.3. The Statistic yz 
As for 1/2, the distribution of 

x'^x" 
2/3= — 2 — 

in the general situation involves a complicated argu- 
ment, not only because of the complexity of the 
distribution function, but because of the involved 
character of the region over which the integration 
must be performed. Therefore it is not considered 
profitable to discuss the properties of y^ from a gen- 
eral viewpoint, but its properties will be illustrated 
for individual universes, to show how they may be 
derived in any given case. 

a. Rectangular Universe 

We cannot use the joint dj ^(2/2, ^3) in the form 
(17), because 7/2 cannot be integrated out of the 
region as written. It is therefore necessary to re- 
verse the order of the variables in the expression for 
the region. The result is 
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viy^= 



42/3(3-72/3), 

2(1-22/3+22/1), 

4(l-2/3)(72/3-4), 



0<2/3<4 
1 . , 3 

f<2/3<l, 



whose graph is sketched in figure 3. 
For the moments we have 

^(2/3)=^ a (2/3)=i ^^=0.2622. 

As in section 3.2, b, the rectangular distribution to 
which these apply is the square form /(a?) = 1, <x < 1, 
and elsewhere. For the form with standard devia- 
tion unit}^, see section 3.2, c, (4). 



p(y,) 




Figure 3. Frequency function of 2/3 for a rectangular universe. 



b. Normal Universe 

The statistic 2/3 takes the functional forms 



2/3= 



x' + x'' 



-J when X2 — Xi<Xz — X2{i'e. X3>2x2— Xi) 



when X2—Xi>Xz — X2{i.e,Xi<2x2—x^. 



(19) 



As a first step in obtaining the distribution of ^3, the 
joint dj of x' and x'^ is determined from that of cci, 

X2, Xz- 

Writing 

J{xi,X2,Xs)dxidx2dxz=^f(xi)J{x2)J{x^)dxidx2dXiy 

— ^<CXi<X2<Xz<C ^ 

1 _i 2 

where, on the right-hand side, J(x) ——f=^ ^ '^ , we 

V 27r 



have, on integrating over the above conditions, 



g{x',x')dx'dx" = ^ \ f j{x,)j{x2)j{xz)dxz'dx,dx2\x,=x' 

\^J X^>'2X2-Xi Ax2=X' 

+ 6 I J{xi)J(x2)f(xs)dxi • dxidxs l^=x^ 

LJ^l<^2-X2 Jxs=X' 



= ^j{x')f{x")[\-F{2x'-x")+F{2x"-x')]dx'dx\-<:^<x"<x'<^, (20) 



where 



J -to 
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The desired distribution is then derived by means 
of the transformation 



y2^-^{x'-x"), 



givmg 



7rV2xLJo j2/3+3y2 Jo J-oo J 



(21) 



Alternative derivation. The author is indebted to 
Professor J. Wolfowitz of Cornell University for 
the following interestiag method of deriving the 
above result. 

The method is to take two of the three observa- 
tions (which may be done in CI ways) and express 
the fact that they are the closest two by writing the 
condition that the third is at a greater distance 
from either one than the interval between the two 
selected. 

This mav be schematicallv shown as follows: 



REGION 
WHERE X, 
MAY LIE 



REGION 
WHERE X2 
MAY LIE 



^y2- 



i*-y2-H^y2 



^yz- 



If x' and x" are the closest pair, then 

(i) Half the distance between them is 7/2=- 



(ii) The abscissa of the mid-point between them is 

x' + x'\ 

... ^'^^2^! 
(iii) The coadition that x'^ and x' are the closest two 

is equivalent to the condition: Xi lies to the left 

of A or xz lies to the right of B. 

Combining condition (iii) with the fact that, 
either from the diagram or by inverting the trans- 
formation in (i) and (ii), x^=yz-\-y2 and x^^=yz—y2, 
gives for the joint dj o{ yz, y2y 



J(y^yy2)dyzdy2= 

3 -i[(2/3+?/2)2+(y3-y2)2: 

— e 



J«3-+ 



J 2^ 



dt+ 



dt 



«2 v: 



; j 2dyz dy^, < y2< c» , — 00 <2/3< . 



The odd moments of 2/3 vanish by symmetry. 
The even moments of y?, require the evaluation 
of integrals of the type 



<l>2k{a,p)= ^ 



2k ^ -(ax^ + bv^ + cz^) 



dxdydz. 



This may be accomplished by first putting A:==0, 
differentiating ^^ with respect to p, and obtaining 
an integral of the form 



^ 

bp 



</>o(a,p)=j_^ ( ye-^dydz= ^^ y 



where Q=kx^+lxy-\-my^, A=4:km—P^0, k, m>0. 
Integrating back yields the value of <^o(^, p)- Next, 
differentiating this value with respect to c gives the 
even moments of ys. 

We thus obtain the results 

E(y^) = 0,k odd 

^ 47r 
cr(7/3)-0.7986 

c. Comparisons with Other Measures of Two Observations 

Since for samples of two, yz is merely the midrange, 

^ = ^(^1+3:^2), we have the following results:^^ 

(1) ''True duplicates'^ (sample size n=2) 

Rectangular universe: g(x)= 1|^|12, —'\^S<x< -yfS; 
and elsewhere. 

E(yz) = 

(^(ys)=2 

Normal universe: J(x) = (l/-y/^2Tr)e 



-^12 _ , 



■<x< 



<r(2/3)=^^'2 = 0.7071 



15 This and the other steps of the analysis in the case of multiple integrals 
may be shown to be valid by methods analogous to the usual ones for simple 
integrals. 

16 Acknowledgment is due G. R. Seth, who first discovered and communicated 
this value to the author after deriving it by a different method, which the author 
has found useful at other points of this paper. (See also footnote 3.) 

17 As in section 3.2, d, the reference for the case of the rectangular universe is 
Wilks [6]; for the normal universe, Jones [9]. The values for the rectangular uni- 
verse are computed by finding the mom.ents of ys for the square universe and 
then adjusting by the location and scale factors described in section 3.2, d, (1). 
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(2) Lowest"* pair out of three (n=3) 
Rectangular universe: " 



■E^(^^')=-v'3/4 = -0.43 



30 



Normal universe: 



'(^)=G-'^')*=»-" 

(3) Midrange of all three measurements, - (a:i + ^3)' 
(n=3) 

Rectangular universe :^^ 



r=0.5477 



Normal universe: 



i^H-ty 



= 0.6018 



(4) Closest pair in samples of three 

For comparison, moments of ?/3=(a;'+^'0/2, the 
average of the closest pair out of three, are presented 
here, based on the moments of 7/s found above (sees. 
3.3, a and 3.3, b.) and also adjusted, in the case of 
the rectangular distribution, to the location and 
scale factors used several times previously. 

Rectangular universe: ^^ 



^(2/3) = ^|^=0.9083 



Normal universe: 

E{y,)=0 



(^{yz)= 



\2^47r/ 



0.7986 



18 Analocous results for tho highest pair are obtainable from symmetry con- 
siderations. 

1" Adjusted as already mentioned in previous sections: g(x)=l/'^JT2,— -yTz-^x^ 
V3: and elsewhere. 




W 



Figure 4. 



Distribution of the outlying value in a sample of 
three from the rectangular distribution 

3.4. The Extreme Value x'" 

For the rectangular distribution we obtained the 
joint density function (12) in section 3.2, a, above, 
of x\ x" ^ x'" . This consisted of just the product of 
the individual density functions (unity for the rcc- 
tangidar miiverse), but defined over a complicated 
appearing region. Following the principles eluci- 
dated in that section, we obtain the dj of x'" by 
first expressing the region suitably, then integrating 
out x' and x" . The result is given by equation (16) 
which, for the rectangular distribution, becomes, with 



^{w) = ?> 



(^'-^+0 



0<i(;<l, 



(22) 



which is the parabola sketched in figure 4. 



Although considerable attention has been devoted 
to the anomalous values or outliers Xx or iCa {%n for 
samples of n) separately, these have not, so far as is 
known to the author, been united into a siagle 
statistic of the type x"\ Thus^ {22) actually exhibits 
a distribution of an outlier as distinct from a (^^one- 
end^^) extreme value Xi (or Xr^). 
The moments of w are 



1 



1 



E{w)=^. (7(^)=— V10 = 0.31 



62 



The joint distribution of w and y^, given without 
proof, is as follows: 



/(2/3,^)== 



4(w-?/3), 
122/3, 
4(2/3-^), 
12(1-2/3), 



1 / / 

-^w<y^<w, 

0<ys<-^w, 

w<y3<^(w+'S), 
1 



0<^<1 
0<w<l 
0<w<l 



(^+3)<2/3<l, 0<'w;<l 



This distribution may be used to obtain the correla- 
tion between x"' and 2/3=0 (^' + ^'0, which turns 
out to be — .37689, or slightly under— 3/8. This seems 
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to imply a slight tendency for a pair of close, small 
values in a sample of 3 from a rectangular population 
to be associated with a relatively large outlying value, 
and conversely, for close high values. 

The distribution of the extreme value x'" would 
also be of interest for the normal and other dis- 
tributions. Although the analytical methods neces- 
sary for handling the integrals encountered could be 
developed and extended on the basis of the procedures 
thus far given, this would not appear to be warranted 
for the purposes of this paper. 

4. Conclusion 

This paper has developed methods for deriving the 
exact distributions and related properties of certain 
statistics not heretofore considered which throw light 
on some aspects of the behavior of very small samples 
encountered in experimental laboratory work. These 
statistics, designated ^i, 2/2, Vz depend not solely on 
the order of the observations but also take their 
relative closeness into account. The aim was to 
provide only the mathematical theory, for samples 
of three, and present only the more interesting results 
and comparisons (summarized in table 1) and not 
attempt to use the results as a basis for setting up 
criteria for the rejection of observations. 

The results have some bearing on the old question 
of the rejection of outlying observations. They show 
that at least for the normal, rectangular, and right 
triangular universes, for a sample as small as three a 
rejection criterion based on the relative sizes of the 



two gaps formed by the three measurements is hardly 
a satisfactory one, for high ratios between these gaps 
occur with surprising frequency, as indicated in 
table 2, even when all three observations come from 
the same universe. 
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